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ABSTRACT 

Stochastic  approximation  procedures  are  considered  for  the  estimation  of 
parameters  using  incomplete  data.  One  procedure  is  stated  and  illustrated 
which  often  leads  to  asymptotically  efficient  estimators.  Others  are 
developed  which,  although  possibly  not  optimal  in  the  above  sense,  will  be 
very  much  easier  to  apply.  This  will  be  particularly  advantageous  when  quick 
recursive  estimates  are  required.  Examples  are  given  and  a  link  is  made 
between  one  of  the  sub-optimal  methods  and  the  EM  algorithm. 
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SIGNIFICANCE  AND  EXPLANATION 


Many  statistical  problems  involve  the  estimation  of  parameters  in  a  model 
using  data  which  are  incomplete.  For  instance,  some  values  may  be  missing 
altogether  or  they  may  be  "censored"  in  that  their  exact  values  are  not  known 
but  are  known  to  fall  in  a  specified  range. 

Almost  without  fail,  estimation  using  such  data  is  significantly  more 
awkward  than  if  they  were  complete  and,  although  numerical  methods  are 
available,  there  is  scope  for  faster  procedures,  even  if  the  resulting 
estimates  may  not  be  quite  as  "optimal".  This  paper  describes  methods  which 
incorporate  the  data  one  at  a  time  into  the  estimation  procedure.  This  leads 
to  recursive  estimates  which  may  well  be  desirable  in  themselves,  if  the  data 
do  arrive  sequentially.  The  procedures  described  are  of  the  "stochastic 
approximation"  type,  for  which  extensive  theory  exists. 

Most  emphasis  is  placed  on  two  such  recursions,  one  which  is  asymptotical 
optimal  and  one  which,  although  suboptimal,  will  be  very  much  simpler  from  a 
computational  point  of  view.  This  latter  method  can  also  be  neatly  linked  to 
one  of  the  main  procedures  for  nonrecursive  estimation  in  incomplete  data 
problems,  the  EM  algorithm.  ~ r;, 7. ,  -or 

A  few  illustrative  examples  are  given.  1  / 
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The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


RECURSIVE  PARAMETER  ESTIMATION  USING  INCOMPLETE  DATA 


D.  M.  Titterington 
1 .  INTRODUCTION 

Parameter  estimation  using  incomplete  data  tends  to  be  much  more  awkward 
than  with  a  corresponding  set  of  complete  data*  Maximum  likelihood 
estimation/  for  instance,  usually  requires  numerical  methods,  such  as  the 
methods  of  Scoring  and  Newton  Raphson.  Dempster  et  al  (1977)  give  a 
compendium  of  incomplete  data  problems  and  describe  an  alternative  numerical 
iterative  procedure,  the  EM  algorithm,  which  has  the  mixed  blessings  of  being 
of  first  order  but  monotonic  and  easy  to  program.  If  very  large  data-sets  are 
involved,  then  numerical  procedures  can  become  very  expensive.  Their 
application  to  survey  data  with  nonresponse  could  be  a  case  in  point. 

We  shall  illustrate  here  some  alternative  recursive  procedures  in  which 
the  data  are  run  through  once,  sequentially.  Such  a  procedure  will  take  the 
form 

where  £  denotes  the  parameter ( s ) .  denotes  the  estimate  after  k 

observations  and  y^+i  denotes  the  (k+1)st  observation.  If  there  are  n 
observations  altogether,  then  the  estimate  we  would  quote  is  _9*. 

When  data  do  arrive  sequentially,  as  in  control  engineering  contexts, 
such  recursive  procedures  may  be  essential  to  give  "quick"  up-to-date 
parameter  estimates,  particularly  if  sequential  design  is  to  be  incorporated; 
see  chapter  7  of  Goodwin  and  Payne  (1977),  Titterington  (1980)  and  references 
therein.  In  the  more  usual  statistical  contexts,  we  shall  have  to  impose  some 

*  ~  "  ~ 
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ordering  on  the  data,  in  conflict,  say,  with  the  likelihood  principle 


(Anderson,  1979).  We  shall  show  that,  asymptotically,  the  ordering  is 
irrelevant  and,  in  a  later  paper  (Titterington  and  Jiang,  1982),  evidence  will 
be  presented  that  the  ordering  effect  is  not  very  important  in  moderately 
sized  samples. 

Some  simple  sequential  estimation  procedures  do  not  suffer  from  the 
criticism  of  order  dependence,  as  is  shown  by  the  following  illustrations  in 
which  there  is  no  incomplete  data. 

Example  1.1.  Independent  Bernoulli  trials 

Suppose  yt ,y2,  ...  are  independent  and  that  P(yk“1)  *  0  =  1-P(y^=0), 
k  ■  1,2,  ...  .  Then  the  recursion 

V,  -sk*  -V- — 

®0  *  0  ' 

generates  exactly  the  MLE's  of  6  as  the  data  are  incorporated. 


Example  1.2.  Exponential-family  type  models 


Suppose  yvy2,  ...  are  independent  and  that  each  has  p.d.f. 

log  f(y|£)  ■  const  +  t(y)T£  +  a(£)  , 

where  £  is  a  vector  and  t_(y)  the  vector  of  sufficient  statistics  for  £. 
Let  9  -  9(j»)  -  i(t(y)|f). 

A 

Then,  given  y^,...,/^  the  MLE,  satisfies 

A  -  I  u  ■ 

i-i 

»  ke 


where  t^  -  tCy^,  i  *  1 < 
ite  {< 

£.  +  (k+1 )_1 (t  .  -  8>,  k  -  0,1,...,  X 


We  may  calculate  {0^}  recursively  from 


s\ 

8 

-k+1  ~k 
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The  link  between  Example  1 . 2  and  recursive-least-squares  is  clear  ;  see 
also  Harrison  and  Stevens  (1976). 

In  these  examples  the  recursions  simply  give  a  convenient  way  of 
calculating  the  usual  estimates ,  and  are  unnecessary  when  considering  the 
asymptotic  theory  and  general  performance  of  the  estimators  produced.  Our 
objective  is  to  develop  a  similar  approach  to  cope  with  the  possibility  of 
incompleteness  in  the  observations. 


2.  SOME  RECURSIVE  PROCEDURES 


Suppose  y^yj,  ...  are  independent  observations,  each  with  underlying 
probability  density  function  (p.d.f.)  g(yl.0),  where  j)  e  0  c  rs,  for  some 

s.  Let  S(y,(^)  denote  the  vector  of  scores.  That  is, 

S ,  (y,9_)  *  gg-  log  gtylj)),  j  =  1,...,s  . 

3  j 

2 

Let  D  (y ,0^)  denote  the  matrix  of  second  derivatives  of  log  gCyfjU  and  let 
1(0.)  denote  the  Fisher  information  matrix  corresponding  to  one  observation. 

It  is  assumed  that  all  derivatives  and  expected  values  exist  and  that 

Eg  S(y,£)  =  /  S(y,£)  g(y|0)dy  «  0^  ; 

1(0)  -  B0{S(y,0)ST(y,O)}  -  -*0  D2(y,0)  . 

Consider  the  recursion 

££+1  "  }_1£(yk+1  *®J) •  k  *  0,1,...  (2) 

which  is  recognizable  as  a  stochastic  approximation  procedure.  Under 
regularity  conditions  over  and  above  those  alluded  to  above,  as  k  ♦  •», 

<£*  -  V  -  N(^'  I(5o)_1)  '  (3) 

in  distribution,  where  0^  denotes  the  true  parameter  value.  This  result 
appears  in  Sacks  (1958),  Fabian  (1968),  Nevel’son  and  Has'minskii  (1973, 
Chapter  8)  and  Fabian  (1978). 

We  now  state  the  conditions  required  for  the  most  useful  version  of  the 
result  in  Fabian  (1978). 

(Cl)  Continuity. 

(i)  /  (s(y,6)  -  S(y,J))}T{s(y,6)  -  S(y,j))  }g(y  |j))dy  ♦  0 

as  £  ♦  £  in  0. 

(ii)  If,  as  k  ♦  •,  0*  ♦  0  ,  then 

— k  -0 

[HO*)]'1  *  [KOg))"1  . 
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(C2 )  "Definiteness" 


-<£-9)TI<6)~1  Bq  S(y,6)  >  0  for  6  0  .  (4) 

(C3)  Boundedness 

*9II(«)"1S(y/6)»2  <  c{  1  +  II 6-0 12}  ,  (5) 

2  T  t 

where  lul  =  u  u  and  C  is  independent  of  5. 

One  further  comment  must  be  made  which  has  particular  relevance  to  some 

of  the  examples  in  Section  3,  namely  that  it  is  assumed,  in  the  theory,  that 

6,  for  all  k.  In  practice,  (2)  may  have  to  be  modified  to  ensure 

this.  For  instance,  if  0  is  a  probability  (see  Example  3.3,  for  instance) 

an  additional  constraint  should  be  added,  such  as:  e  <  0*  <  1-e,  for  all 

k  and  some  small  positive  e. 

Given  all  this,  (3)  is  guaranteed. 

If  (3)  holds  for  (2)  then  it  also  will  for 

£*+1  “  +  {<k+1)I<Q*)}~1S(yk+1,0£),  k  -  0,1,...  .  (6) 

It  is  easy  to  check  that  the  recursive  calculations  of  the  MLE's  in 

Examples  1.1  and  1.2  are  special  cases  of  (6). 

As  we  shall  see  in  some  of  the  Examples  in  Section  3,  complications  may 

arise  in  applying  recursions  (2)  and  (6),  in  the  computation  and  inversion,  in 

the  multiparameter  case,  of  I(0£).  Numerical  integration  is  often  necessary 

and  the  fact  that  we  are  dealing -with  incomplete  data  will  add  to  the 

complications.  Suppose,  with  reference  to  (2),  we  write 

\  -  {kl(0j)}_1  . 

Then  the  following  alternatives  to  suggest  themselves. 

(i)  kl(0'),  where  0'  is  an  initial  parameter  estimate  or  one  that  is 

updated  infrequently. 
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k 

(ii)  l  J  (8* ) ,  where  J.(*)  denotes  the  sample  information  matrix  from 
i-1  1  “*  1 

the  ith  observation e 


k 

(iii)  I  1(6*). 
i-1 

k 

(iv)  l  J  (0*) . 

i-1 

Suggestion  (i)  corresponds  to  a  familiar  modification  to  the  Method  of 
Scoring  for  obtaining  maximum  likelihood  estimates.  Suggestion  (ii)  is 
similar  to  Newton's  method  for  the  same  purpose.  Suggestions  (iii)  and  (iv) 
would  te  very  useful  in  providing  recursive  calculation  of  the  {v^1 }.  If 

(iv)  is  used,  for  instance,  we  obtain 


*  i<js>  •  «71 

Recursion  (2),  with  exactly  this  modification,  was  used  by  Walker  and 
Duncan  (1967)  in  the  recursive  estimation  of  parameters  in  a  linear  logistic 
model  for  quantal  response.  In  their  problem  the  observations  are  not 
identically  distributed,  so  that 


vk’  *  i  vir  • 

i-1 


They  are  particularly  fortunate,  in  that  each  1^(6.)  is  of  rank  one  so  that, 
given  VQ ,  all  other  {v^}  can  be  obtained  without  further  matrix 
inversion:  see  their  equation  (S.4). 

Theoretical  and  practical  investigation  of  these  modifications  would  be 
worthwhile. 

We  shall  concentrate,  however,  on  the  following  modification  of  (2), 
which  suggests  itself  especially  for  incomplete  data  problems. 


where  I  (6_)  denotes  the  Fisher  Information  matrix  corresponding  to  a 
c  ~ 

complete  observation.  For  future  reference  we  denote  by  equation  (9)  the 
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version  of  (8)  corresponding  to  (3).  Although  these  recursions  will  not  lead 
to  asymptotic  efficiency,  conditions  (4)  and  (3)  sometimes  guarantee  in¬ 
consistency  and  asymptotic  Normality.  We  extract  the  following  theorem  from 
Sacks  (1958)  and  Fabian  (1968).  We  state  the  univariate  version,  for  future 
application  to  the  first  three  examples  in  Section  3. 

Theorem  1 . 

Given  conditions  corresponding  to  those  above  and  provided 

ailVVV'  >  ’> 

/k  (6  -  A  )  N(0,  I  (A  )‘2i(A  )/{2I(0n)I  (A.)-1  -  1}) 

K  U  CU  0  UCU 

in  distribution  as  k  ♦  *. 

As  will  become  clear,  it  does  not  always  happen  that  21 (8Q)  >  i  (A  ). 
Suppose 

0  <  6  <  21{6  )/t  (8  )  <  1 
U  CO 

and  we  consider  the  recursion 

®k+1  *  ®k  +  kJ/2<1+8)Ic(Ak)"1s(y,A]{),  k  -  0,1,...  .  (10) 

Then,  according  to  Fabian  (1968), 

kp/2(0k  -  e0)  ♦  n( o ,  ic(e0)”2KA0)/{2i(e0)ic(80)"1  -  A}) 

in  distribution,  as  k  *  •. 

Thus,  provided  there  is  some  information  in  the  incomplete  data 
(I(0O)  >0),  a  modified  version  of  (8)  leads  to  a  consistent,  asymptotically 
Normal  estimator. 

Multidimensional  versions  of  these  results  will  be  required  in  a  complete 
study  of  Example  3.4  but  this  will  not  be  undertaken  in  the  present  paper: 
see  Sacks  (1958)  and  Fabian  (1968). 

The  important  practical  advantage  of  recursions  (8),  (9)  and  (10)  is  that 
I  (j))  will  usually  be  much  easier  to  evaluate  and,  if  a  matrix,  to  invert, 
than  1((>). 


In  the  following  section  we  derive  versions  of  some  of  these  recursions 
for  a  few  simple  examples  involving  incomplete  data.  As  y^,  y2,..* 
represents  a  sequence  of  incomplete  observations,  so  x1 ,  Xj,...  will  denote 
corresponding  "complete"  versions.  Thus,  given  y,  x  belongs  to  a  subset 
X(y)  of  the  overall  sample  space  X  and,  if  f(x|£)  denotes  the  p.d.f. 
of  x,  then 

g(y|8)  a  /X(yjf (x|0l)dx  ; 

See  Dempster  et  al  <1977). 
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3 .  SOME  EXAMPLES 


Example  3.1.  Trinomial  with  incompletely  classified  observations. 

Independent  observations  are  obtained  from  a  trinomial,  with  cell 
probabilities  |  6,  |  0,  1-0  (0  <  0  <  1).  However,  all  that  is  known  is 
whether  or  not  the  observation  belongs  to  cell  1  (x  =  1  as  opposed  to  x  = 
or  3 ) .  Let 

y  =  1  if  x  =  1 

=0  if  x  =  2  or  3 

Then 

log  g  (y|0)  =  y  log(-^  ®)  +  (l-y)log(l  -  |  0)  , 

S(y,0)  =  y/0  -  ( 1  — y ) / < 2—0 ) 

and  1(0)  =  0-1(2-0)-1  . 

Recursion  (2)  is 

It  is  not  hard  to  show  that  conditions  (4)  and  (5)  of  Section  2  are 
satisfied. 

Similarly,  Ic(0)  =  ®  1 ( 1 —9 )  1  and  recursion  (8)  is 

V,  -  \  *  *"  V'-V!vA  -  • 

However,  for  all  0  <  0  <  1,  1(0) /I  (0)  =  ( 1  —  0 ) / ( 2—9 )  <4  so  Theorem  1 

c  2 

will  not  hold  and  {0^}  is  not  ^“consistent.  In  spite  of  this  it  is 
possible  to  establish  strong  consistency  of  {0^}  by  appeal  to  a  theorem  of 
Gladyshev  (1965).  Also,  for  any  9^  >  0,  a  modified  recursion  of  the  form 
(10)  can  be  used  to  obtain  a  consistent,  asymptotically  Normal  estimator. 
Example  3.2.  Censored  exponential. 

Suppose  there  is  censoring  on  the  right  at  tg  and 

log  f(x|6)  =  -log  0  -  x/0  (x  >  0,  9  >  0) 


Thus 


so  that 


y  ■  x  if  x  <  tQ  . 

Otherwise  y  is  the  knowledge  that  "x  >  t0", 

log  g( y | 6 )  «  -log  9  -  y/8  if  x  <  tn 


-tg/8,  otherwise  . 


It  turns  out  that  (2)  is 

9;+i  • e;  *  -  *;>  <v>  *  v 

*  8*  +  (k( 1  -  exp(-tQ/8*) )}  *t  ,  otherwise  , 

and  1(0)  *  {1  -  exp(-t0/8)}/82. 

Condition  (4)  is  satisfied,  its  left  hand  side  being 

(<S-0)2(1  -  exp(-t0/8))/(1  -  exp(-tQ/6))  . 

However,  the  left  hand  side  of  (5)  is 

-t  /6  -2  -t  /8  -t  /8 

(t  -  e  0  )  {(l  -  e  0  )<202-286+62)  ♦  2(0-«)toe  °  }  , 

which  tends  to  infinity  as  &  *  0.  If,  however,  we  restrict  <$  >  e  >  0 

condition  (5)  will  hold. 

>2  i 

Since  1^(8)  *  6  ,  Theorem  1  holds  provided  1  -  exp(-t0/8fl)  >  — 


is,  if  tQ  >  0Q  log  2.  Recursion  (8)  is 

-1, 


Vi  "  ®k  +  k  (yk+1  ’  9k)  (Vl  <  V 


.1 

9.  +  k  V 

k  0 


(otherwise)  . 


Again,  however,  Gladyshev's  theorem  shows  strong  consistency  of  { 
for  any  tQ  >  0.  Recursions  like  (10)  may  also  be  considered. 

Example  3.3.  Estimation  of  mixing  weights. 

He  consider  the  case  of  a  mixture  of  d  known  densities 
{fjO,  j  -  1 ,  —  ,d} . 

d-1  d-1 

9<yll>  -  I  *  (i  -  I  Of  fy>  , 

j-i  33  j-i  3 


,  that 
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where  the  8  ,...,6  are  all  nonzero  probabilities.  Then 
i  a 

s^yie)  -  {£j (y)  -  fd<y)}/g<y|0),  j  -  , 

£jr(yi!5  *  -{*j(y)  -  fd(y)Hfr(y)  -  fd(y))/{g(y) I®)}2  . 

j  ■  1,...,d-1,  r  **  1,...,d-1 

and 

ijr<0)  m  I  -  fd<y)}*Vy>  -  *dfy»«(yl®>_1«y  , 

j  ,  T  ■  1  /  •  •  .  ,  !t“  1  . 

Verification  of  the  regularity  conditions  is  subsumed  in  Kazakos  (1977) 
and  smith  and  Makov  (1978).  For  the  special  case  of  d  -  2,  with  e,|  *  0, 
we  obtain,  for  (2),  as  in  Kazakos  (1977), 

«;*,  *  K  *  <««•*»-' <f,(yM1)  -  V*k..,)/«<V.i',k1'  *  -  i.J . 

with 

i(8)  -  /  (f , (y)  -  f2(y))2g(y|8)-1dy  . 

We  maintain  our  concentration  on  the  case  d  *  2. 

Here  the  incompleteness  is  caused  by  ignorance  of  the  source  of  an 
observed  y ;  is  it  component  1  or  component  2?  We  may  write 

x  “  (y ,z)  , 

where  £  *  (1,0)  or  (0,1)  according  to  the  source.  Thus 

log  f(x|6)  *  zTu(8)  +  *Tv(0) 

where 

uT(8)  -  (log  0,  log( 1-8 ) ) 

and  vT(0)  ■  (log  f^ty),  log  f 2 (y ) ) 

Thus,  1  (8)  ■  1/8 (1-8)  and  (8)  becomes 
c 

V.  ■  \  *  k‘'  V'-V(Vyk.,>  -  VlW,/’|yk.ilV  •  <” 

Asymptotically,  if  1(8)  >4  I  (8),  Theorem  1  holds.  Otherwise,  strong 

2  C 

consistency  can  still  be  guaranteed  (see  Makov  and  Smith  (1977),  Smith  and 
Makov  (1978))  and  recursions  like  (10)  may  also  be  used. 
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Example  3.4.  Mixture  of  two  univariate  Normals. 

Let 

g<yl®#M,£)  *  ®1p(ylw1  )  ♦  62p(y|y2/4>2) 

*  91P1<y)  ♦  62P2<y)  # 

where  0  <  8^  »  1-®2  <  1  and 

p(y|y,$)  *  ( 2x$)  ^2exp{-  ~  (x-y)2/$)  . 

Then  the  component  of  the  score  vector  are 

3  log  g(y)/3®1  *  (p^y)  -  P2(y>)/g(y>  / 

3  log  g(y)/3y^  *  (y-y^)w^(y )/b^t  j  “  1/2, 

3  log  g(y)/3$^  *  {(y-y^)2  “  (y)/2t2,  j  *  1,2, 

where  w^(y)  *  ®^p^ (y)/g(y) ,  j  “  1/2. 

Note  that,  for  j  -  1,2,  w^(y)  is  the  conditional  probability  that  an 
observation  comes  from  component  j,  given  its  datum  value,  y. 

Here  we  do  not  go  with  the  details  of  the  verification  of  conditions  (4) 
and  (5).  They  will  be  complicated,  as  is  application  of  the  recursion  (2), 
itself,  because  the  information  matrix  is  a  complicated  matrix,  even  for 
univariate  mixtures,  let  along  multivariate  ones.  As  in  Example  3.3, 
numerical  integration  is  necessary t  see  Behboodian  (1972). 

To  point  out  this  awkwardness  in  application  is  the  main  reason  for 
mentioning  this  example.  Zt  motivates  strongly  the  use  of  recursions  like 
(8).  For  this  we  require  Ic(  9^  . 

Again  x  *  <y,z)  and  now 

log  f(x|9,£,£)  =*  zTu(£)  +  zTv(y,£)  , 

where,  for  instance, 

v^Ji/i)  -  log  p(y|Uj,+j)  . 
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|(k+1)  _  2(k) 


r(k) 


+  {k8u'rvK'(y  hy 

j  v  j  J  j  iyk+1,vyk+1  wj  ' 

r(k+1 )  r(k)  ,  j  » (k)  •. -1  (k),  ,,,  „(k), 

♦j  **j  +  U®j  }  Wj  (yk+1){(yk+1  “  Uj  1 


.  (k),  ,  n(k)  .  ,  (k)  , (k) .  .  ,  , A (k)  (k)  ,<k), 

where  w^  (y)  «*  0^  p(ylw^  ,  )/g( y|6  ,  P  ,  £  ] 


I 


then 


2  - 

,  j  -  1,2 
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4.  A  CONNECTION  WITH  THE  EM  ALGORITHM 
As  pointed  out  by  Fabian  (1978,  Section  5.8),  there  is  a  strong 
relationship  between  recursion  (2)  and  the  Method  of  Scoring.  Recursion  (8), 
on  the  other  hand,  is  similarly  linked  to  the  EM  algorithm. 

Suppose  x1r...,xn  represent  n  independent  complete  observations, 

corresponding  to  y.|,...,yn«  Define 

n 

Q(£l£')  “*8|{  l  log  f(x  |£)  ly1#...,y  >  . 

-  i-1 

The  EM  algorithm  generates  a  sequence  {8  }  of  parameter  estimates  by 

"T B 

repeating  the  following  double  step. 

B-step:  Evaluate  0(818^). 

M-step;  Choose  £  ■  8  to  maximum  Q ( 6 | 8^ ) . 

Consider  the  following  recursive  version. 

At  stage  k  +  1,  with  current  estimate  8^,  define 

V,<ei  -  E  do,  f(Vl|.)  lyM1)  ♦  ^<6,  .  (12) 

Choose  8  *  £.. .  to  maximize  L.  ..(8).  Finally,  estimate  jL  by  8  . 

—  — k+1  k+1  —  — 0  — n 

Both  the  EM  algorithm  and  its  recursive  version  may  be  used  in  Bayesian 
analysis  for  the  computation  of  posterior  modes.  In  (12)  we  can  initialize 
using 

LQ(8)  “  lo9  P<£)  » 

where  p(*)  is  the  prior  density  for  9,  with  mode  8^. 

Theorem  2.  Approximately,  given  appropriate  regularity,  recursion  (12)  can  be 
written  as 

!*♦,-!**  <(**'dc<vr’!s(wv  . 

which  is  the  recursion  we  called  (9)  in  Section  2. 
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Proof:  To  clarify  the  steps  we  omit  some  subscripts  and  rewrite  (12)  as 

-  *e-  {1°9  f(xli)ly}  +  Lk(i)  ' 

where  £'  maximizes  Lk(£). 

We  derive  the  recursion  while  showing,  inductively,  that  approximately 
for  £  near  £*, 

1^(8)  -  Lk(8M  -  j(£-£* )T{kIc(0')}(8-8')  . 

For  x  e  x(y),  define  the  conditional  density 

k(x|y,8)  ■  f(x|8)/g(y|8) 

Then  by  Taylor  expansion,  approximately, 
log  f (x| 8)  -  log  f(x|£')  +  <£-£,)TDe,  log  f(x|£' ) 

+  |(£-£' )D^,  log  f(x|0£)  •  (8-0' ) 

-  log  f(x|£')  +  (£-£’  )T{s(y,£* )  +  Dg,  log  k(x|y,£')} 

+  ^®-®'>T5e.  log  *  <®-®'>  • 

Given  appropriate  regularity, 

*0.  lo9  k(x|y,£' )ly}  =  0  , 

so  that,  approximately, 

Lk+1(8)  -  Eg , { log  f(x|£')|y}  +  1^(8')  +  ( 8-8* )TS(y,£’ ) 

-  i  (0-0' )T{(k+1 )I  ( 8 * ) } ( 0—9 * )  . 

The  maximizing  £  is 

|  -  0 '  +  {(k+1)Ic(0,)r1S(y,0')  , 

which  is  the  required  recursion. 

Also,  from  (13), 

LVJ.(0)  ■  c  +  (i-lj^ty,!1)  -  i  (£-£)T{(k+1)l  (£'  )}(£-£) 

-  (0-0){(k+1)lc(|)}(0-0')  , 

where  c  is  independent  of  £  , 


(13) 


(14) 
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.  C  _  1  (6-0)T{(k-H)I  (9')}(0-0),  from  (14)  , 

2 -  C  - - 

-  c  -  4  <9-e)T{(k+i)i  ( e ) } < e-e >  . 

Theorem  3.  in  exponential  family  models  in  which  £  is  the  expected  value  of 
the  sufficient  statistic,  the  recursion  is  exact. 

Proof;  Suppose  log  f(x|£)  *»  b(x)  +  tTj>(9)  +  a(£(0J)  where  t^  *  t(x)  is  a 
vector  of  sufficient  statistics  and 

l0(t)  -  9  . 

Then 


D0  log  f(x|9)  »  Ic(®Ht-8)  • 

Suppose  Dg  1.^(0.)  *  kl  (£)(•' '-8' ) .  This  certainly  holds  for  k  *  1.  Then  the 

stationarity  condition  for  is 

I  ( 9 ) ( t * -9 )  +  kl  (9) (6*-9)  -  0  ,  (15) 

c  —  —  ~  c  ~  ~ 

where  t1  ■  Kg, {log  f(x|0)|y}.  Thus,  if  all  information  matrices  are 
nonsingular. 


I  (9 • ) (t'-fl )  +  kl  (9»)(9'-8) 


c  — 


Z.  -1  jL  9 


lee. 


9  -  8*  +  {(k+1)I  ( 8* ) }~^l  (9')(t'-e') 
—  —  c  — ■  c  —  —  — 


-  9 •  ♦  {(k+1)Ic(8')}  S(y|8')  . 

In  fact,  from  (15),  (k+1)<)  ■  t*  ♦  k9',  so  l  at 

Sg  L*f1(->  “  (k+1)Ic(|)(8-0)  . 

These  results  can  be  illustrated  by  applying  recursion  ( 12 )  to  the 
examples.  In  3.1,  3.2  and  3.3,  we  obtain  exactly  the  same  formulae  as  with 
recursion  (9).  In  Example  3.4  the  recursion  on  9^  is  the  same  and  the 

others  differ  very  slightly  as  follows. 

(k+1)  (k)  (k)  (k) 

j  'yk+1  yk+1  j  ' 

.(k+1)  m  .(k)  f^(y  )}(y  -  U(K)  )2  -  (1  -  f^(y 

j  iyk+1  ;vyk+1  j  yk+l',?j  1  ' 

j  *  1,2,  where 


-<k),  ;  r.,ft(k)  >  (k),  .,-1  (k).  , 

<y)  -  {k9j  +  w^  (y)}  w^  (y) 
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Note  that  “  {k8^}  ^w^Cy),  for  large  k. 

Bayesian  versions  of  some  of  these  recursions  have  appeared  before:  that 
for  Example  3.3  (c.f.  (11))  In  Makov  and  Smith  (1977)  and  Smith  and  Makov 
(1978)  i  that  for  Example  3.4  in  Titterington  (1976). 

For  the  exponential  family  models  considered  in  Theorem  3  the  recursions 
have  particularly  simple  forms,  reminiscent  of  Example  1.2.  Recursion  (2)  is 

2J.1  -U  *  -  9J)  . 

Recursion  ( 8 )  is 

-9J  *  K-'(«<^1lYkM.9jl  -  •*>  • 
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5 .  DISCUSSION 


Although/  whenever  it  ie  relevant/  recursion  (2)  is  the  ideal  choice,  it 
is  likely  to  be  complicated  to  apply  in  large  problems.  There,  the 
modifications  of  recursions  (8)  and  (10)  promise  to  be  much  easier  in 
practice.  Only  a  few  examples  have  been  described  and,  apart  from  the 
mixtures  problems,  no  missing  data  example  has  been  discussed.  This  is 
rectified  in  Titterington  and  Jiang  (1982),  with  emphasis  on  exponential 
family  and,  in  particular,  multivariate  Normal  distributions.  There,  also, 
are  provided  numerical  details  about  the  relative  performance  of  same  of  the 
procedures,  which  is  an  important  aspect  of  the  study.  As  Makov  (1980)  points 
out  in  the  context  of  Example  3.3  with  d  ■  2,  recursion  (2)  *nay  b* 
unsatisfactorily  unstable,  relative  to  (8)  or  (9),  particularly  in  the  *rly 
stages. 

He  finish  with  a  final  s^;**nt  about  the  EM  algorithm.  Recursion  (2)  is 
related  co  the  method  of  Scoring,  which  generates  a  sequence  of  estimates 
{3  }  according  to  the  recursion 


A 

6 

-m+1 


where  y1 , . . . ,yn  denotes 
It  is  easy  to  show. 


+  {nl(6  )}_1  l  S(y  ,§  ),  m  - 
-m  A-1  -  1  “■ 

n  independent  observations, 
using  the  methods  of  Theorem  2, 


0,1 


,  •  • . 


that  the 


EM 


algorithm  is  given,  approximately,  by 

3,.i  -3  +  {nl(8  r1  l  s(y.,$  ),  m  -  0,1,  . 

-m+l  -m  c  -m  ^  —  i  -m 

Again,  for  the  exponential  family  case  of  Theorem  3,  the  iteration  is 

exact,  although  a  simpler  version,  of  course,  is 

$  '  -  n"1  l  t!B),  where  t!B)  -  (t  ly  ),  i  «  1,... ,n  . 

1  1  0  1  1 
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